Abstract
The porpouse of this work is to test the **APT** and the **Roll Model** with high frequency prices from an bitcoin order book. Models will be tested with different objectives that reallies on market microstructure dynamics.
The **APT** model look to model the value of a risky asset, based on different economic assumptions and a consumption framework that ends with the result of and optimization process which lead to the model formulation. The **APT** model describe the price of the risky asset as Martingale, a discrete stochastic process which states that the best it can be expected of the value of the risky asset is the previous value.
The **Roll Model** main contribution is to priced the transaction cost spread in a market where we don't have bid/ask prices, the market not found on orders, the best bid and ask are given by a dealer/market maker. The model states that the spread denoted as _c_ is constant through time, it also implies that the probability distribution of the market direction denoted as 1 and -1 for a buy or sell respectively is 50%. Finally the model talk about an efficient price which already price all the fundamental information of the asset.
To test the **APT Model** it would be test when the martingale property is fullfiled using the mid price and the weighted mid price of the whole order book. With these two variables the property is going to be tested by grouping ocurrences by minutes, results will be presented in a datframe with the frequency of scenarios and the proportion in terms of the total observantions per minute, visual support its's also incorporated. To test the **Roll Model** it would be test calculating the spread using the midprice with that spread theorical bid and ask are calculate two and compare the real bid/ask with the real mid, this in order to visualize the real spread and the calculated by the model. Also assumptions of uncorrelation between changes in prices is tested.
The results for the **APT** model where consistent in terms of the proportion of samples that fullfield the model assumption for the two prices and the minute grouping. But the model was not accurate from an empirical perspective because the assumption was not fullfield 100% of the time or for all the samples, rounding it the general proportion was 70/30. The results for the **Roll Model** contrast with the observe data the spread was constant through all the time which is not accurate in comparison with real data where is observed that the spread can contract or wide through time. Also it was found some negative autocorrelation between the price changes.
This is an acitivity for the subject Microstructure & Trading Systems of the program B.S Financial Engineering at ITESO. The porpouse of this work is to apply introductory models with solid economic and statiscal framework to model asset dynamics in a high frequency context. The models are apply to order books of a cryptocurrency: BTC/USDT, models assumptions are test from an empirical approach, results are presented altogether with visualizations incorporated.
The python version in which this proyect was done is python 3.8.13.
In order to run this notebook, it is necessary to have installed and/or have the requirements.txt file with the following:
The following are the file dependencies that are needed to run this notebook:
%%capture
# Install all the pip packages in the requirements.txt
import sys
!{sys.executable} -m pip install -r requirements.txt
# import files & libraries
import data as dt
import functions as fn
import numpy as np
import pandas as pd
import itertools
import plotly.io as pio
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import plotly.offline as pyo
import visualizations as vn
pyo.init_notebook_mode()
# import data
ob_data = dt.ob_data
keys = list(ob_data.keys())
book1 = pd.DataFrame(ob_data[keys[0]])
display(book1.head(),book1.tail())
| bid_size | bid | ask | ask_size | |
|---|---|---|---|---|
| 0 | 0.000400 | 28270.0 | 28275.0 | 0.025405 |
| 1 | 0.009787 | 28269.0 | 28276.0 | 0.516810 |
| 2 | 0.008168 | 28268.0 | 28277.0 | 0.005044 |
| 3 | 0.995787 | 28266.0 | 28278.0 | 0.377374 |
| 4 | 1.038704 | 28265.0 | 28280.0 | 1.179715 |
| bid_size | bid | ask | ask_size | |
|---|---|---|---|---|
| 20 | 2.809104 | 28244.0 | 28296.0 | 3.424052 |
| 21 | 0.756619 | 28243.0 | 28297.0 | 0.005064 |
| 22 | 0.697787 | 28242.0 | 28298.0 | 1.192474 |
| 23 | 0.377316 | 28241.0 | 28299.0 | 0.847424 |
| 24 | 0.010000 | 28240.0 | 28300.0 | 0.802599 |
Data was extracted from the bitfinex exchange.
The columns that constitute each order book are:
print(f"The imported data define as 'ob_data' its a dictionary that contain {len(keys)} order books.")
print("The traded asset in the order books is Bitcoin and its bid/ask price is quoted in USDT.")
print(f"The order books recorded are from {keys[0]} to {keys[-1]}.")
print("Data granualarity goes up to miliseconds.")
The imported data define as 'ob_data' its a dictionary that contain 2401 order books. The traded asset in the order books is Bitcoin and its bid/ask price is quoted in USDT. The order books recorded are from 2021-07-05T13:06:46.571Z to 2021-07-05T14:06:46.412Z. Data granualarity goes up to miliseconds.
To show some metrics and statistics of the data, the f_descriptive_ob function is import from functions file. And for the visualizations functions are import from visualizations file.
For general descriptive pourposes the metrics shown are:
For an brief descriptive introduction of the variables that are involve in subsequent process such as: mid price, weighted mid price and orber book imbalance, the following information is shown:
help(fn.f_descriptive_ob)
Help on function f_descriptive_ob in module functions:
f_descriptive_ob(data_ob: dict) -> dict
Docstring
Parameters
----------
data_ob : dict
Orderbook as the input data, a dictionary with the following structure:
"timestamp": object timestamp recognize by machine, e.g. pd.to_datetime()
'bid_size:'volume for bid levels
'bid:'bid price
'ask:'ask price
'ask_size: volume of ask levels
Returns
-------
r_data: dict
Dictionary with the following metrics.
'median_ts_ob':list containing float
'midprice':list containing float
'spread':list containing float
'No. of levels':list containing int
'Bid Volume':list containing float
'Ask Volume': list containing float
'Total Volume':list containing float
'Orderbook Imbalance':list containing float
'Weighted midprice':list containing float
'VWAP Volume Weighted Average Price':list containing float
'OHLCV': DataFrame of OHLCV sample by hour, shape [2,5]
columns: opening, close, minimun and maximun (calculated from orderbook midprice),
volume (calculated as the total volume)
'stats_ob_imbalance': Dataframe containing the following statistical moments
of the orderbook imbalance: Median, Variance, Bias, Kurtosis.
'bid': bid price.
'ask': ask price.
-------
#ob metrics
data_1 = fn.f_descriptive_ob(data_ob=ob_data)
print(f"For this particular set of order books all of them contain the same amount of levels which are: {set(data_1['No. of price levels']).pop()}")
print(f"The median time the order book update is {data_1['median_ts_ob']} miliseconds.")
For this particular set of order books all of them contain the same amount of levels which are: 25 The median time the order book update is 1500.0 miliseconds.
help(vn.plot_orderbook)
Help on function plot_orderbook in module visualizations:
plot_orderbook(book)
Limit OrderBook horizontal bars plot.
Parameters
----------
book: dictionary of an orderbook compound of the following variables:
bid_size, bid, ask_size, ask.
Returns
-------
A plotly graph with the orderbooks levels (bid/ask) X axis.
Y axis represent the volume.
vn.plot_orderbook(book1)
This chart corresponds to the first order book.
In general it is appreciate that there is more volume in the bid side. This can be confirm with the total volume for bid and ask for this book, calculations are shown below. Also in the first levels the volume is higher in the buy side. The most high level in terms of volume can be observe in the bid side at a price of 28,255K and a Volume of 16.08 BTC.
print(f" Total bid volume for the first order book {data_1['Bid Volume'][0]}")
print(f" Total ask volume for the first order book {data_1['Ask Volume'][0]}")
Total bid volume for the first order book 32.100178 Total ask volume for the first order book 31.699457
Definition:
The average of the ask and bid price at the top of the book.
Variables notation:
Formula
$$ {mid} = \frac{ask_{i,0}+bid_{i,0}}{2} $$help(vn.plot_ts)
Help on function plot_ts in module visualizations:
plot_ts(y: list, x: list, variable_name: str)
Univariate time series plot function.
Parameters
----------
y : list
which contain the values of the variable of interest.
x : list
which coitain the timestamp.
variable_name : str
The name you want to use for your variable of interest.
title : str
DESCRIPTION.
Returns
-------
plotly plot.
vn.plot_ts(y=data_1['midprice'], x=keys, variable_name='Mid Price')
mid_stats=pd.DataFrame(data_1['midprice'],columns=['stats']).describe()
mid_stats
| stats | |
|---|---|
| count | 2401.000000 |
| mean | 28351.878592 |
| std | 42.215634 |
| min | 28270.000000 |
| 25% | 28315.500000 |
| 50% | 28349.500000 |
| 75% | 28384.500000 |
| max | 28444.500000 |
As it can be seen the mean of the mid price in the time series was 28,351 USDT, the standard deviation 42.21%, the minimun mid price 28,270 and the maximun 28,444.5
Definition:
The average of the ask and bid price at the top of the book multiply by the imbalance of the book. In this case that the imbalance is in terms of the bid, it is the midprice weighted by the bid volume proportion of the order book.
Variables notation:
Formula
$$ w_{mid} = \frac{\sum_{i}^{n} bid_{size} }{\sum_{i}^{n}bid_{size}+\sum_{i}^{n}ask_{size}}*\frac{ask_{i,0}+bid_{i,0}}{2} $$$$ w_{mid} = imbalance * mid $$vn.plot_ts(y=data_1['weighted_midprice'], x=keys, variable_name='Mid Price')
wmid_stats=pd.DataFrame(data_1['weighted_midprice'],columns=['stats']).describe()
wmid_stats
| stats | |
|---|---|
| count | 2401.000000 |
| mean | 10421.096231 |
| std | 3162.565763 |
| min | 2619.021876 |
| 25% | 8183.809600 |
| 50% | 10162.518809 |
| 75% | 12373.587760 |
| max | 24138.998296 |
Watching the statistics it is notice that the weighted midprice have more variance in terms of the standard deviation in comparison with the mid price, the reason behind this behaviour could be the addition of the order book dynamics in terms of volume, which is taken into account by multiplying the mid price with the imbalance.
Definition:
The imbalance of the order book, tell us where the orderbook is biased in terms of volume positions it can be towards the ask or bid. In this case the formula is in terms of the bid, therefore we can infer (it we want to look in terms of the ask) that the complement $1-imbalance$ is going to be the imbalance seen from the ask perspective.
Variables notation:
Formula
$$ {imbalance} = \frac{\sum_{i}^{n} bid_{size} }{\sum_{i}^{n}bid_{size}+\sum_{i}^{n}ask_{size}} $$help(vn.plot_boxplot)
Help on function plot_boxplot in module visualizations:
plot_boxplot(y: list, variable_name: str)
Univariate Boxplot function with plotly.
Parameters
----------
y : list
which contain the values of the variable of interest.
variable_name : str
The name you want to use for your variable of interest.
Returns
-------
plotly boxplot.
vn.plot_boxplot(y=data_1['orderbook_imbalance'], variable_name='Order Book Imbalance ')
In the chart it is observe that although the median is 0.35 in terms of the bid, there are some outliers where the orderbook can be more skew to the bid size. Below the median,variance,skew and kurtosis are shown.
imb_st=data_1['stats_ob_imbalance']
print('Main statistics of the order book imbalance are seen below')
display(imb_st)
Main statistics of the order book imbalance are seen below
| Median | Variance | Skew | Kurtosis | |
|---|---|---|---|---|
| 1 | 0.357747 | 0.012448 | 0.392483 | 0.196053 |
The Asset Pricing Theory it's a theorical framework for asset valution driven from an economic perspective.
This model is most use for assets that could bring a payment in the future. For example a stock that pay dividends. The model begin with a simple formulation:
$$ x_{t+1} = p_{t+1} + d_{t+1} $$Where:
The payment is given to a economic agent which would act under the following conditions model by an utility function that would be in termns of consumption:
$$ U(c_t,c_{t+1}) = U(c_t) + {\beta}U(c_{t+1}) $$Where:
Constraints:
$k$ represents the number of titles of the risky asset, the agent can take long positions ${k}$ or short positions ${-k}$, in such a way that restrictions can be model by through this equations:
$$ c_t = e_t -{p_t}K $$ $$ c_{t+1} = e_{t+1} + x_{t+1}K $$
It's important to mention that $e_{t+1}$ is an unknown variable, then the expected value operator is require for the following optimization problem:
max {k} $$ U(c_t) + \mathbb{E}[{\beta}U(c_{t+1})] $$
After the optimization calculations:
$$ U'(C_t)P_t = \mathbb{E}[{\beta}U'(C_{t+1})X_{t+1}] $$Where:
Finally the model results:
$$ P_t = E[m_{t+1}X_{t+1}] $$Where:
The payment would be equal to the future stock price $ P_{t+1}$ plus the dividends $d_{t+1}$.
$$ X_{t+1} = P_{t+1} + d_{t+1} $$Let's remember that this analysis is focus in the market microstructure where transactions are done in short time intervals. Then:
Therefore:
$$ X_{t+1} = P_t + 0$$$$ X_{t+1} = P_t $$$$ P_t = \mathbb{E}[(1)(1)P_{t+1}] $$Finally the price of a risky asset from a consumption model under de market microstructure is model as Martingale Stochastic Process.
$$ P_t = \mathbb{E}[P_{t+1}] $$The objective of this process is to test from an emprirical perspective the martingale property in the market microstructure in this experiment $P_t$ is the mid price calculated from the Top of the Book of each order book.
The two scenarios that would be test for this sample are:
The number of times each scenario took place and its proportion in respect of the total would be discuss in the Results Section 6.1.
help(fn.f_martingale)
Help on function f_martingale in module functions:
f_martingale(data_ob: dict, price: str = 'midprice', interval: str = 'None') -> dict
Parameters
----------
data_ob : dict
Orderbook as the input data, a dictionary with the following structure:
"timestamp": object timestamp recognize by machine, e.g. pd.to_datetime()
'bid_size:'volume for bid levels
'bid:'bid price
'ask:'ask price
'ask_size: volume of ask levels
price : str
Define the type of price you want to test for your analysis pourpuse.
By default it is set to work with the midprice, but it can also work
weighted_midprice.
For the weighted midprice: the argument is 'weighted_midprice'
For the midprice is: 'midprice'
If the argument is not specify it would be calculated with the
midprice by default.
interval : str
The martingale property can be test with all the prices in the
Order Book or it can be done by minute interval for the first hour.
For the interval in minutes the argument should be 'minutes'.
If the argument is not given by default it would be calculated
with all the prices of the order book.
Returns
-------
dict
When the calculation is done without the minute interval, the function
returns a dictionary with the following keys:
e1: a dictionary with the number of ocurrencies where the
first scenario (e1) happen, and the proportion respect
to the total experiemnt trials.
e1 = el w_midprice_t = w_midprice_t+1
e2: a dictionary with the number of ocurrencies where the
first scenario (e2) happen, and the proportion respect
to the total experiemnt trials.
e2 = w_midprice_t != w_midprice_t+1
DataFrame
When the calculation is done by minute interval, the function returns
a pandas dataframe compound of the following columns:
interval: The minute
total: The sum of the experiments for each minute.
e1: Number of trials that fullfill the first scenario.
e2: Number of trials that fullfill the second scenario.
proportion e1: The proportion of the first scenario that fullfill
the first scenario respect to the total trails for each minute.
proportion e2: The proportion of the second scenario that fullfill
the first scenario respect to the total trails for each minute.
mid_exp1 = fn.f_martingale(data_ob=dt.ob_data)
print('Experiment results for Martingale test with the Mid Price sample.')
pd.DataFrame(mid_exp1)
Experiment results for Martingale test with the Mid Price sample.
| e1 | e2 | total | |
|---|---|---|---|
| cantidad | 1763.00 | 637.00 | 2400 |
| proporcion | 0.73 | 0.27 | 2400 |
The objective of this process is to test from an emprirical perspective the martingale property in the market microstructure in this experiment $P_t$ is the mid price calculated from the Top of the Book of each order book.
But for this experiment would be done for intervals of time, in this case the interval all the minutes from the first hour of data. So for this case the martingale is going to be test by grouping the mid price by the minute they took place.
The two scenarios that would be test for this sample are:
The number of times each scenario took place and its proportion in respect of the total would be discuss in the Results Section 6.2 incorporating visual support.
mid_exp1_h = fn.f_martingale(data_ob=dt.ob_data,price='midprice',interval='minutes')
print('Experiment results for Martingale test with the Mid Price sample group by minute.')
display(mid_exp1_h.head(),mid_exp1_h.tail())
Experiment results for Martingale test with the Mid Price sample group by minute.
| interval | total | e1 | e2 | proportion e1 | proportion e2 | |
|---|---|---|---|---|---|---|
| 0 | 0 | 40 | 28 | 12 | 0.70 | 0.30 |
| 1 | 1 | 39 | 28 | 11 | 0.72 | 0.28 |
| 2 | 2 | 38 | 30 | 8 | 0.79 | 0.21 |
| 3 | 3 | 40 | 33 | 7 | 0.82 | 0.18 |
| 4 | 4 | 39 | 27 | 12 | 0.69 | 0.31 |
| interval | total | e1 | e2 | proportion e1 | proportion e2 | |
|---|---|---|---|---|---|---|
| 55 | 55 | 39 | 27 | 12 | 0.69 | 0.31 |
| 56 | 56 | 38 | 27 | 11 | 0.71 | 0.29 |
| 57 | 57 | 40 | 31 | 9 | 0.78 | 0.22 |
| 58 | 58 | 39 | 30 | 9 | 0.77 | 0.23 |
| 59 | 59 | 38 | 28 | 10 | 0.74 | 0.26 |
The objective of this process is to test from an emprirical perspective the martingale property in the market microstructure in this experiment $P_t$ is the weighted mid price calculated as the product of the mid price and the order book imbalance.
The two scenarios that would be test for this sample are:
The number of times each scenario took place and its proportion in respect of the total would be discuss in the Results Section 6.3
wmid_exp2 = fn.f_martingale(data_ob=dt.ob_data,price='weighted_midprice')
print('Experiment results for Martingale test with the Weighted Mid Price sample.')
pd.DataFrame(wmid_exp2)
Experiment results for Martingale test with the Weighted Mid Price sample.
| e1 | e2 | total | |
|---|---|---|---|
| cantidad | 1616.00 | 775.00 | 2391 |
| proporcion | 0.68 | 0.32 | 2391 |
The objective of this process is to test from an emprirical perspective the martingale property in the market microstructure in this experiment $P_t$ is the mid price calculated from the Top of the Book of each order book.
But for this experiment would be done for intervals of time, in this case the interval all the minutes from the first hour of data. So for this case the martingale is going to be test by grouping the weighted mid price by the minute they took place.
The two scenarios that would be test for this sample are:
The number of times each scenario took place and its proportion in respect of the total would be discuss in the Results Section 6.4 incorporating visual support.
wmid_exp2_h = fn.f_martingale(data_ob=dt.ob_data, price='weighted_midprice', interval='minutes')
print('Experiment results for Martingale test with the Weighted Mid Price sample group by minute.')
display(wmid_exp2_h.head(),wmid_exp2_h.tail())
Experiment results for Martingale test with the Weighted Mid Price sample group by minute.
| interval | total | e1 | e2 | proportion e1 | proportion e2 | |
|---|---|---|---|---|---|---|
| 0 | 0 | 40 | 28 | 12 | 0.70 | 0.30 |
| 1 | 1 | 39 | 26 | 13 | 0.67 | 0.33 |
| 2 | 2 | 38 | 28 | 10 | 0.74 | 0.26 |
| 3 | 3 | 40 | 27 | 13 | 0.68 | 0.32 |
| 4 | 4 | 39 | 26 | 13 | 0.67 | 0.33 |
| interval | total | e1 | e2 | proportion e1 | proportion e2 | |
|---|---|---|---|---|---|---|
| 55 | 55 | 39 | 26 | 13 | 0.67 | 0.33 |
| 56 | 56 | 38 | 27 | 11 | 0.71 | 0.29 |
| 57 | 57 | 40 | 27 | 13 | 0.68 | 0.32 |
| 58 | 58 | 39 | 26 | 13 | 0.67 | 0.33 |
| 59 | 59 | 38 | 26 | 12 | 0.68 | 0.32 |
The roll model present a distintion between price components that have to do with the fundamentals of a security. And the ones that can be attributable to the market trading process which are transitory. Price components found on fundamentals are model through a random walk.
$$ P_{t} = P_{t-1} + \mu + u_{t} $$Where:
As it is shown empirically below $\mu$ its going to be close to zero, therefore this term its excluded.
So as in the APT model:
$$ E[P_{t+1}] = p_{t} $$It turns to be a martingale stochastic process.
But in microstructure analysis as it is discuss in the results section prices are usually not martingales. This is where the roll model offer an important attribution for the modelling process of the market trading process and pricing the cost of transaction that a dealer is charging.
For the Market trading process the Roll Model keep the random walk but instead of the actual transaction price ${m_t}$ it use the efficient price ${p_t}$ this price in a market with complete information context priced all the new information that appears.
$$ m_t = m_{t-1} + u_t $$Assumptions:
At time t transaction price $p_t = m_t + q_{t}c $
The Roll Model have two parameters:
There can be estimated from the varince and first order autocovariance of the price changes $\Delta{p_t}$.
Variance:
$$ \gamma_0 = Var(\Delta{p_t}) = 2c^2 + \sigma^2_{u} $$Autocovariance:
$$ \gamma_1 = Cov(\Delta{p_{t-1}},\Delta{p_t}) = -c^2 $$This parameters would be calculated with the data from the order book. Where $p_t$ would be the mid price.
The spread to be forecast would be the last one from the historic data.
The uncorrelation between $\Delta{p_t}$ would be test too.
Finally in the Results section as a form of conparison between real spread and the spread from the model. The time series autocovariance would be calculate it as their spread to for each point of the time in order to contrast the model with the real data.
Autocovariance function calculate autocovariance with the following formula:
$$\frac{1}{N-k} \sum_{t=1}^{N-k} (Y_t - \bar{Y} )(Y_{t-k} - \bar{Y})$$where:
Autocorrelation function calculate autocorrelation with the following formula:
$$\frac{\sum_{t=1}^{N-k} (r_t - \bar{r} )(r_{t-k} - \bar{r})}{\sum_{t=1}^{N-k} (r_t - \bar{r})^2}$$where:
help(fn.auto_cov_delta)
Help on function auto_cov_delta in module functions:
auto_cov_delta(ts: list, lag: int = 1) -> float
The porpouse of this function is to calculate the autocovariance of
a time series with n lags.
Parameters
----------
ts : list of fata (only the list without the timestamp)
DESCRIPTION.
lag : int, optional
DESCRIPTION. The default is 1.
Returns
-------
the autocovariance of the series with its k-lags. (float value)
help(fn.auto_corr_delta)
Help on function auto_corr_delta in module functions:
auto_corr_delta(ts: list, lag: int = 1) -> float
The porpouse of this function is to calculate the autocorrelation of
a time series with n lags.
Parameters
----------
ts : list of fata (only the list without the timestamp)
DESCRIPTION.
lag : int, optional
DESCRIPTION. The default is 1.
Returns
-------
the autocovariance of the series with its k-lags. (float value)
help(fn.roll_model)
Help on function roll_model in module functions:
roll_model(gamma1: float) -> float
The objective of this function is to calculate the spread given by
the roll model.
Parameters
----------
gamma1 : float
covariance of the changes of the variable of interest.
Returns
-------
float
The spread priced by the role model.
# parameters calculation
pt = data_1['midprice']
gamma1 = fn.auto_cov_delta(pt)[0]
model_spread=fn.roll_model(gamma1)
gamma0=2*model_spread**2
real_spread=data_1['spread'][-1]
print(f'The spread given by the Roll Model is {model_spread} and the observe spread for this timestamp is {real_spread}')
print(f'Correlation of the price changes is: {gamma1/np.var(fn.auto_cov_delta(pt)[1])}')
print('In the Results section 6.5 results are discuss.')
The spread given by the Roll Model is 0.07002917274559008 and the observe spread for this timestamp is 5.0 Correlation of the price changes is: -0.0001456943097445253 In the Results section 6.5 results are discuss.
help(fn.roll_model_ts)
Help on function roll_model_ts in module functions:
roll_model_ts(ts: list) -> list
The objective of this function is to calculate the spread priced by
the role model for each point in time, in order to compare it with
another time series e.g. the real spread.
Parameters
----------
ts : list
time series of the variable of interest
Returns
-------
list
the spread for each observation
the length of the output would be of size ts-2
because the autocorrelation function needs enough data to perform
the necessary calculations.
# spread roll model comparison
spread_pred = fn.roll_model_ts(pt)
#results save in a dataframe
roll_results = pd.DataFrame()
roll_results['observed_spread']=data_1['spread'][3:]
roll_results['roll_model_spread']=spread_pred
roll_results['time']=keys[3:]
# theorical mid
#gettig the bid/ask of the TOB of each book
bid = data_1['bid']
ask = data_1['ask']
# theorical bid/ask adding the model spread which is assume to be constant
t_bid = pt - model_spread
t_ask = pt + model_spread
# assumption test for delta bid/ask that will be discuss in the results
corr_bid = fn.auto_corr_delta(bid)
corr_ask = fn.auto_corr_delta(ask)
# saving results in a dataframe
bid_ask_comp = pd.DataFrame()
bid_ask_comp['theorical_bid'] =t_bid
bid_ask_comp['theorical_ask'] = t_ask
bid_ask_comp['observed_ask'] = ask
bid_ask_comp['observed_bid'] = bid
bid_ask_comp['observed_mid'] = data_1['midprice']
bid_ask_comp['time'] = keys
print(f'Correlation of the bid price changes is: {corr_bid}')
print(f'Correlation of the ask price changes is: {corr_ask}')
print('Comparison between the theorical mid price and the observed mi price are discuss and shown in Results section 6.5.')
print('Visualizations are included in the results section.')
Correlation of the bid price changes is: -0.0001354837641965043 Correlation of the ask price changes is: -0.00013748501469258305 Comparison between the theorical mid price and the observed mi price are discuss and shown in Results section 6.5. Visualizations are included in the results section.
display(bid_ask_comp.head(),bid_ask_comp.tail())
| theorical_bid | theorical_ask | observed_ask | observed_bid | observed_mid | time | |
|---|---|---|---|---|---|---|
| 0 | 28272.429971 | 28272.570029 | 28275.0 | 28270.0 | 28272.5 | 2021-07-05T13:06:46.571Z |
| 1 | 28272.429971 | 28272.570029 | 28275.0 | 28270.0 | 28272.5 | 2021-07-05T13:06:47.918Z |
| 2 | 28272.429971 | 28272.570029 | 28275.0 | 28270.0 | 28272.5 | 2021-07-05T13:06:49.414Z |
| 3 | 28276.429971 | 28276.570029 | 28278.0 | 28275.0 | 28276.5 | 2021-07-05T13:06:51.077Z |
| 4 | 28276.429971 | 28276.570029 | 28278.0 | 28275.0 | 28276.5 | 2021-07-05T13:06:52.426Z |
| theorical_bid | theorical_ask | observed_ask | observed_bid | observed_mid | time | |
|---|---|---|---|---|---|---|
| 2396 | 28358.429971 | 28358.570029 | 28362.0 | 28355.0 | 28358.5 | 2021-07-05T14:06:40.583Z |
| 2397 | 28358.429971 | 28358.570029 | 28362.0 | 28355.0 | 28358.5 | 2021-07-05T14:06:41.919Z |
| 2398 | 28358.429971 | 28358.570029 | 28362.0 | 28355.0 | 28358.5 | 2021-07-05T14:06:43.416Z |
| 2399 | 28356.429971 | 28356.570029 | 28359.0 | 28354.0 | 28356.5 | 2021-07-05T14:06:45.070Z |
| 2400 | 28356.429971 | 28356.570029 | 28359.0 | 28354.0 | 28356.5 | 2021-07-05T14:06:46.412Z |
print(f"The proportion of the trials that fullfill the martingale property is {mid_exp1['e1']['proporcion']}")
print(f"In the other hand the proportion that did not acomplish the property is {mid_exp1['e2']['proporcion']}")
print("Therefore the property have a high percentange of acomplishment in empirical terms, but it is not fullfill 100% of the time.")
display(pd.DataFrame(mid_exp1))
The proportion of the trials that fullfill the martingale property is 0.73 In the other hand the proportion that did not acomplish the property is 0.27 Therefore the property have a high percentange of acomplishment in empirical terms, but it is not fullfill 100% of the time.
| e1 | e2 | total | |
|---|---|---|---|
| cantidad | 1763.00 | 637.00 | 2400 |
| proporcion | 0.73 | 0.27 | 2400 |
display(mid_exp1_h.head(),mid_exp1_h.tail())
| interval | total | e1 | e2 | proportion e1 | proportion e2 | |
|---|---|---|---|---|---|---|
| 0 | 0 | 40 | 28 | 12 | 0.70 | 0.30 |
| 1 | 1 | 39 | 28 | 11 | 0.72 | 0.28 |
| 2 | 2 | 38 | 30 | 8 | 0.79 | 0.21 |
| 3 | 3 | 40 | 33 | 7 | 0.82 | 0.18 |
| 4 | 4 | 39 | 27 | 12 | 0.69 | 0.31 |
| interval | total | e1 | e2 | proportion e1 | proportion e2 | |
|---|---|---|---|---|---|---|
| 55 | 55 | 39 | 27 | 12 | 0.69 | 0.31 |
| 56 | 56 | 38 | 27 | 11 | 0.71 | 0.29 |
| 57 | 57 | 40 | 31 | 9 | 0.78 | 0.22 |
| 58 | 58 | 39 | 30 | 9 | 0.77 | 0.23 |
| 59 | 59 | 38 | 28 | 10 | 0.74 | 0.26 |
help(vn.boxplot_multi)
Help on function boxplot_multi in module visualizations:
boxplot_multi(df: dict, variables: list, x_ax: str, orient: str, title: str, h: int, xaxes: str, yaxes: str, newnames: dict)
Returns
-------
df: dict
dataframe (two dimensional dictionary) with the variables of interest.
variables: list
variables or variable you want to plot a list of strings.
x_ax: str
the variable that will represent the x axi in the plot.
oreint: str
chart orientation v for vertical, h for horizontal.
title: str
The title of the chart.
h: int
height of the chart.
xaxes: str
the name of the x axis.
yaxes: str
the name of the y axis.
newnames: dict
names for the variables that are going to be visible.
vn.boxplot_multi(df=mid_exp1_h,variables=['e1','e2'], x_ax='interval',orient='v',title='Mid Price per minute: Martingale test',
h=600, xaxes='Minutes',yaxes='Proportion',
newnames={'e1':'Martingale test Fullfield', 'e2': 'Martingale test failed'})
In the chart it is observe that for each minutes it is more frequent to find that the property is fullfield. Also in terms of proportion results are similiar to the ones found in the previous experiment. The model is not one houndred percent accurate from a empirical perspective but in proportion it is more consistent fullfiling scenario where the martingale property happen per minute in comparison than when it failed.
print(f"The proportion of the trials that fullfill the martingale property is {wmid_exp2['e1']['proporcion']}")
print(f"In the other hand the proportion that did not acomplish the property is {wmid_exp2['e2']['proporcion']}")
print("Therefore the property have a high percentange of acomplishment in empirical terms, but it is not fullfill 100% of the time.")
print("The previous experiment (6.1 results) show a higher accuracy but the differences are not so wide.")
display(pd.DataFrame(wmid_exp2))
The proportion of the trials that fullfill the martingale property is 0.68 In the other hand the proportion that did not acomplish the property is 0.32 Therefore the property have a high percentange of acomplishment in empirical terms, but it is not fullfill 100% of the time. The previous experiment (6.1 results) show a higher accuracy but the differences are not so wide.
| e1 | e2 | total | |
|---|---|---|---|
| cantidad | 1616.00 | 775.00 | 2391 |
| proporcion | 0.68 | 0.32 | 2391 |
display(wmid_exp2_h.head(),wmid_exp2_h.tail())
| interval | total | e1 | e2 | proportion e1 | proportion e2 | |
|---|---|---|---|---|---|---|
| 0 | 0 | 40 | 28 | 12 | 0.70 | 0.30 |
| 1 | 1 | 39 | 26 | 13 | 0.67 | 0.33 |
| 2 | 2 | 38 | 28 | 10 | 0.74 | 0.26 |
| 3 | 3 | 40 | 27 | 13 | 0.68 | 0.32 |
| 4 | 4 | 39 | 26 | 13 | 0.67 | 0.33 |
| interval | total | e1 | e2 | proportion e1 | proportion e2 | |
|---|---|---|---|---|---|---|
| 55 | 55 | 39 | 26 | 13 | 0.67 | 0.33 |
| 56 | 56 | 38 | 27 | 11 | 0.71 | 0.29 |
| 57 | 57 | 40 | 27 | 13 | 0.68 | 0.32 |
| 58 | 58 | 39 | 26 | 13 | 0.67 | 0.33 |
| 59 | 59 | 38 | 26 | 12 | 0.68 | 0.32 |
vn.boxplot_multi(df=wmid_exp2_h,variables=['e1','e2'], x_ax='interval',orient='v',title='Weighted Mid Price per minute: Martingale test',
h=700, xaxes='Minutes',yaxes='Proportion',
newnames={'e1':'Martingale test Fullfield', 'e2': 'Martingale test failed'})
In the chart it is observe that for each minutes it is more frequent to find that the property is fullfield. Also in terms of proportion results are similiar to the ones found in the previous experiment. The model is not one houndred percent accurate from a empirical perspective but in proportion it is more consistent fullfiling scenario where the martingale property happen per minute in comparison than when it failed.
Showing a similar result as the total proportion in the previous experment which was equal to 0.68 In comparison with the mid price the model is also not accurate 100% of time from a empirical perspective. Although proportions it can be say that proportions are similar the mid price was more accurate to the model than the weighted mid price.
fig = px.line(roll_results, x ='time', y = ['observed_spread','roll_model_spread'],
title='Observed Spread vs Roll Model Spread')
fig.update_yaxes(title='Spread')
fig.update_xaxes(title='Time')
fig.show()
The model present important variations from the observed historicall spread. As it is shown the models results in color red. It is notice that there is a moment when it converge to the value calculated in section 5.2 for the last spread 0.0700.
fig = px.line(bid_ask_comp, x ='time', y = ['theorical_bid','observed_mid','theorical_ask'],
title='Roll Model Comparison: Theorical bid Price-Observed Mid Price-Theorical Ask Price')
fig.update_xaxes(title='Time')
fig.update_yaxes(title='Price')
fig.show()
The visual comparison between the theorical bid/ask and the observed mid price stands out for the close aproximation than the bid and ask have to the observed mid price, remaning the spread constant through time as the model state. Therefore the model is impying that the spread should be really a small value relative to what we have seen in the chart above where the real spread is visulize and constant through time.
fig = px.line(bid_ask_comp, x ='time', y = ['observed_bid','observed_mid','observed_ask'],
title='Roll Model Comparison: Observed bid Price-Observed Mid Price-Observed Ask Price')
fig.update_xaxes(title='Time')
fig.update_yaxes(title='Price')
fig.show()
Watching the comparison between the observe bid/ask and mid price it is appreciated that there is a difference between the theorical bid/ask express visualize in the chart above, the spread is not constant thorugh time and the difference between them is greater. In other words, from this chart it is infer that the spread varies through time and it can wide or contract.
APT
In conclusion the APT give a good framework to model the value of a risky asset in a small window of time, as for samples which are in high frequency. However when the model is tested from an empirical approach using real data, the model is not one hundred percent accurante. After performing the experiment I could say, this results where consistent not only between the mid price and weighted mid price from all the group but also in terms of grouping this two mentioned variables per minute, which offer a deeper view of the dynamic of prices and the martingale process. Neverthless in terms of proportion and frequency it is more likely that the martingale property is going to be fullfield.
The reason why i think this model not work sometimes or is not 100% perfect rellies on the fact that maybe the $P_{t-1}$ dont capture or price all of the asset information therefore a source is missing or a stochastic component that represent noise, to help us to represent moments of uncertainty or irrationality (let's remember that the stochastic discount factor in the model it's assume to be equal to one therefore it is constant).
Roll Model
The main contribution of the roll model is to price the cost that the dealers are willing to charge for transactions in a market that is not driven by orders, the best bid and ask we can sell or buy assets it's given by them.
Reviewing the empirical results of the model, I would like to talk first about the spread c, which remains constant as the model propouse. But at the end the model is not able to predict similar results in comparison to the observe data which exhibits a more noisy and random behaviour. It is important to mention that the assumption of uncorrelation between the price changes is not accomplish with a value of -0.00014 which is near to 0 but it express a negative relation between price changes and its first lag. The same happens with changes for the bid and ask prices at the TOB for each order book with negative autocorrelations: -0.00013.
Other fact that I would like to highlight is the difference observed in the charts where the model implied that the theorical bid/ask should be closer to the mid price and constant, as we can see in real data this don't happen in reality the market cost of transaction (spread) tends to contract or wide from different periods of time.
Finally, this model have limitations to describe the spread of an asset at least in a high frequency context like this experiment was realized. One assumption that would be important to retake for further analysis or using this model as the foundation of other could be the probability distribution of market direction, perhaps the 50/50 that is assume could be overstimating or understaming the probability of ocurrence for a buy or sell, it could be interesting to propouse that probabilities could change for some periods of market stress or volatility levels, also this negativa autocorrelations can gave us a hint that past prices could have some influence on the future price estimation.
[1] Munnoz, 2020. Python project template. https://github.com/iffranciscome/python-project. (2021).
[2] Hasbrouck, J. (2007). Empirical Market Microstructure: The Institutions, Economics and Econometrics of Securities Trading. Nueva York, EUA: Oxford University.